Protein Fold Recognition Using Gradient Boost Algorithm
نویسندگان
چکیده
Protein structure prediction is one of the most important and difficult problems in computational molecular biology. Protein threading represents one of the most promising techniques for this problem. One of the critical steps in protein threading, called fold recognition, is to choose the best-fit template for the query protein with the structure to be predicted. The standard method for template selection is to rank candidates according to the z-score of the sequencetemplate alignment. However, the z-score calculation is time-consuming, which greatly hinders structure prediction at a genome scale. In this paper, we present a machine learning approach that treats the fold recognition problem as a regression task and uses a least-squares boosting algorithm (LS Boost) to solve it efficiently. We test our method on Lindahl’s benchmark and compare it with other methods. According to our experimental results we can draw the conclusions that: (1) Machine learning techniques offer an effective way to solve the fold recognition problem. (2) Formulating protein fold recognition as a regression rather than a classification problem leads to a more effective outcome. (3) Importantly, the LS Boost algorithm does not require the calculation of the zscore as an input, and therefore can obtain significant computational savings over standard approaches. (4) The LS Boost algorithm obtains superior accuracy, with less computation for both training and testing, than alternative machine learning approaches such as SVMs and neural networks, which also need not calculate the z-score. What’s more, by using LS Boost algorithm we can identify the more important features in their fold recognition protocol, something that cannot be done using a straightforward SVM approach. keywords: protein structure prediction, z-score, Gradient Boost, fold recognition.
منابع مشابه
Protein fold recognition using the gradient boost algorithm.
Protein structure prediction is one of the most important and difficult problems in computational molecular biology. Protein threading represents one of the most promising techniques for this problem. One of the critical steps in protein threading, called fold recognition, is to choose the best-fit template for the query protein with the structure to be predicted. The standard method for templa...
متن کاملOptimized Edge Detection Algorithm for Face Recognition
Face recognition is one of the most challenging tasks in the field of image processing. This paper presents an optimized edge detection algorithm for the task of face recognition. In this method a gradient based filter using a wide convolution kernel is applied on the image to extract the edges. Later a thinning algorithm optimized for the wide convolution kernel is applied on the extracted edg...
متن کاملیک روش دو مرحلهای برای بازشناسی کلمات دستنوشته فارسی به کمک بلوکبندی تطبیقی گرادیان تصویر
This paper presented a two step method for offline handwritten Farsi word recognition. In first step, in order to improve the recognition accuracy and speed, an algorithm proposed for initial eliminating lexicon entries unlikely to match the input image. For lexicon reduction, the words of lexicon are clustered using ISOCLUS and Hierarchal clustering algorithm. Clustering is based on the featur...
متن کاملMachine Learning Models for Housing Prices Forecasting using Registration Data
This article has been compiled to identify the best model of housing price forecasting using machine learning methods with maximum accuracy and minimum error. Five important machine learning algorithms are used to predict housing prices, including Nearest Neighbor Regression Algorithm (KNNR), Support Vector Regression Algorithm (SVR), Random Forest Regression Algorithm (RFR), Extreme Gradient B...
متن کاملSimultaneous Learning and Alignment: Multi-Instance and Multi-Pose Learning
IGERT 2 Electrical Engineering, California Institute of Technology [email protected] 3 Lab of Neuro Imaging University of California, Los Angeles [email protected] { } { } { } In object recognition in general and in face detection in particular, data alignment is necessary to achieve good classification results with certain statistical learning approaches such as Viola-Jones. Data can ...
متن کامل